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Abstract 

In this work, a Bayesian approximate message passing algorithm is proposed for solving the multiple 
measurement vector (MMV) problem in compressive sensing, in which a collection of sparse signal vec- 
tors that share a common support are recovered from undersampled noisy measurements. The algorithm, 
AMP-MMV, is capable of exploiting temporal correlations in the amplitudes of non-zero coefficients, and 
provides soft estimates of the signal vectors as well as the underlying support. Central to the proposed 
approach is an extension of recently developed approximate message passing techniques to the amplitude- 
correlated MMV setting. Aided by these techniques, AMP-MMV offers a computational complexity that 
is linear in all problem dimensions. In order to allow for automatic parameter tuning, an expectation- 
maximization algorithm that complements AMP-MMV is described. Finally, a detailed numerical study 
demonstrates the power of the proposed approach and its particular suitability for application to high- 
dimensional problems. 

I. Introduction 

As the field of compressive sensing (CS) CD-El matures, researchers have begun to explore numerous 
extensions of the classical sparse signal recovery problem, in which a signal with few non-zero coefficients 
is reconstructed from a handful of incoherent linear measurements. One such extension, known as 
the multiple measurement vector (MMV) problem, generalizes the sparse signal recovery, or single 
measurement vector (SMV), problem to the case where a group of measurement vectors has been obtained 
from a group of signal vectors that are assumed to be jointly sparse, sharing a common support. Such a 
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problem has many applications, including magnetoencephalography 0, 0, direction-of-arrival estimation 
and parallel magnetic resonance imaging (pMRI) Q. 

Mathematically, given T length-M measurement vectors, the traditional MMV objective is to recover 
a collection of length- N sparse vectors {x^}J =1 , when M < N. Each measurement vector, yW, is 
obtained as 

y (t) =Ax®+e®, t = l,...,T, (1) 

where A is a known measurement matrix and is corrupting additive noise. The unique feature of 
the MMV problem is the assumption of joint sparsity: the support of each sparse signal vector x^> is 
identical. Oftentimes, the collection of measurement vectors form a time-series, thus we adopt a temporal 
viewpoint of the MMV problem, without loss of generality. 

A straightforward approach to solving the MMV problem is to break it apart into independent SMV 
problems and apply one of the many SMV algorithms. While simple, this approach ignores valuable 
temporal structure in the signal that can be exploited to provide improved recovery performance. Indeed, 
under mild conditions, the probability of recovery failure can be made to decay exponentially as the 
number of timesteps T grows, when taking into account the joint sparsity fifl . 

Another approach (e.g., 0) to the joint-sparse MMV problem is to restate CD) as the block-sparse 
SMV model 

y = V{A)x + e, (2) 

where y = [t/ 1 ' 1 , . . . , y^ T ] , x = [a;^ 1 ^, . . . , a;( T ) T ] , e = [e^ 1 ) 1 ", . . . , e^ T ] , and 1XA) denotes a 
block diagonal matrix consisting of T replicates of A. In this case, x is block-sparse, where the n th 
block (for n = 1, . . . , N) consists of the coefficients {x n , x n+ N, . . . , sc n +(T-i)Jv}- Equivalently, one could 
express CQ) using the matrix model 

Y = AX + E, (3) 

where Y = [y^, . . . , t/ T) ] , X = [ajW, . . . , x^] , and E = [e«, . . . , e^] . Under the matrix model, 
joint sparsity in (H) manifests as row-sparsity in X. Algorithms developed for the matrix MMV problem 
are oftentimes intuitive extensions of SMV algorithms, and therefore share a similar taxonomy. Among 
the different techniques that have been proposed are mixed-norm minimization methods 0, lfT0l - lfT2l . 
greedy pursuit methods 0, |[T3l , lfT4l . and Bayesian methods 0, |[T5l - |[T8l . Existing literature suggests 
that greedy pursuit techniques are outperformed by mixed-norm minimization approaches, which in turn 
are surpassed by Bayesian methods 0, |[T5l , |[T8l . 
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In addition to work on the MMV problem, related work has been performed on a similar problem 
sometimes referred to as the "dynamic CS" problem |[T9l - |[23l . The dynamic CS problem also shares the 
trait of working with multiple measurement vectors, but instead of joint sparsity, considers a situation in 
which the support of the signal changes slowly over time. 

Given the plethora of available techniques for solving the MMV problem, it is natural to wonder what, 
if any, improvements can be made. In this work, we will primarily address two deficiencies evident in the 
available MMV literature. The first deficiency is the inability of many algorithms to account for amplitude 
correlations in the non-zero rows of Incorporating this temporal correlation structure is crucial, not 
only because many real-world signals possess such structure, but because the performance of MMV 
algorithms is particularly sensitive to this structure JSJ, lfl4l . lfT5l . lfl"8l . |[24l . The second deficiency is 
that of computational complexity: while Bayesian MMV algorithms appear to offer the strongest recovery 
performance, it comes at the cost of increased complexity relative to simpler schemes, such as those based 
on greedy pursuit. For high-dimensional datasets, the complexity of Bayesian techniques may prohibit 
their application. 

Our goal is to develop an MMV algorithm that offers the best of both worlds, combining the recovery 
performance of Bayesian techniques, even in the presence of substantial amplitude correlation and apriori 
unknown signal statistics, with the linear complexity scaling of greedy pursuit methods. Aiding us in 
meeting our goal is a powerful algorithmic framework known as approximate message passing (AMP), 
first proposed by Donoho et al. for the SMV CS problem |[25l . In its early SMV formulations, AMP was 
shown to perform rapid and highly accurate probabilistic inference on models with known i.i.d. signal 
and noise priors, and i.i.d. random A matrices |[25l . |[26l . More recently, AMP was extended to the block- 
sparse SMV problem under similar conditions ll27l . While this block-sparse SMV AMP does solve a 
simple version of the MMV problem via the formulation (0, it does not account for intra-block amplitude 
correlation (i.e., temporal correlation in the MMV model). Recently, Kim et al. proposed an AMP-based 
MMV algorithm that does exploit temporal amplitude correlation lPl6l . However, their approach requires 
knowledge of the signal and noise statistics (e.g., sparsity, power, correlation) and uses matrix inversions 
at each iteration, implying a complexity that grows superlinearly in the problem dimensions. 

In this work, we propose an AMP-based MMV algorithm (henceforth referred to as AMP-MMV) 
that exploits temporal amplitude correlation and learns the signal and noise statistics directly from the 
data, all while maintaining a computational complexity that grows linearly in the problem dimensions. 

'Notable exceptions include 11161 . lfl2 1. and 1 18], which explicitly model amplitude correlations. 
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In addition, our AMP-MMV can easily accomodate time-varying measurement matrices A®, implicit 
measurement operators (e.g., FFT A^'), and complex-valued quantities. (These latter scenarios occur in, 
e.g., digital communication |[28l and pMRI [29'].) The key to our approach lies in combining the "turbo 
AMP" framework of |f30l , where the usual AMP factor graph is augmented with additional hidden 
variable nodes and inference is performed on the augmented factor graph, with an EM-based approach 
to hyperparameter learning. Details are provided in Sections [H] [TV] and [V] 

In Section |VIJ we present a detailed numerical study of AMP-MMV that includes a comparison 
against three state-of-the-art MMV algorithms. In order to establish an absolute performance benchmark, 
in Section HIT] we describe a tight, oracle-aided performance lower bound that is realized through a 
support-aware Kalman smoother (SKS). To the best of our knowledge, our numerical study is the first in 
the MMV literature to use the SKS as a benchmark. Our numerical study demonstrates that AMP-MMV 
performs near this oracle performance bound under a wide range of problem settings, and that AMP-MMV 
is especially suitable for application to high-dimensional problems. In what represents a less-explored 
direction for the MMV problem, we also explore the effects of measurement matrix time-variation (cf. 
0). Our results show that measurement matrix time-variation can significantly improve reconstruction 
performance and thus we advocate the use of time-varying measurement operators whenever possible. 

A. Notation 

Boldfaced lower-case letters, e.g., a, denote vectors, while boldfaced upper-case letters, e.g., A, denote 
matrices. The letter t is strictly used to index a timestep, t = 1,2, ... ,T, the letter n is strictly used to 
index the coefficients of a signal, n = 1, . . . , N, and the letter m is strictly used to index the measurements, 
m = 1, . . . , M. The superscript W indicates a timestep-dependent quantity, while a superscript without 
parentheses, such as k , indicates a quantity whose value changes according to some algorithmic iteration 
index k. Subscripted variables such as x„ are used to denote the n th element of the vector The m th 
row of the matrix A is denoted by aJ m , and the transpose (conjugate transpose) by A J (A H ). An M-by- 
M identity matrix is denoted by I M , a length- N vector of ones is given by 1 N and T>(a) designates a 
diagonal matrix whose diagonal entries are given by the elements of the vector a. Finally, CJ\f(a; b, C) 
refers to the complex normal distribution that is a function of the vector a, with mean b and covariance 
matrix C. 
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II. Signal Model 

In this section, we elaborate on the signal model outlined in Section H and make precise our modeling 
assumptions. Our signal model, as well as our algorithm, will be presented in the context of complex- 
valued signals, but can be easily modified to accommodate real-valued signals. 

As noted in Section Jl we consider the linear measurement model £T|), in which the signal G 
at timestep t is observed as g <c M through the linear operator A G C MxN . We assume e® ~ 
CM(0, erf I M ) is circularly symmetric complex white Gaussian noise. We use S = {n\x { n ] + 0} to 
denote the indices of the time-invariant support of the signal, which is assumed to be suitably sparse, 
i.e., \S\ < M@ 

Our approach to specifying a prior distribution for the signal, p({x^}J =l ), is motivated by a desire 
to separate the support, S, from the amplitudes of the non-zero, or "active," coefficients. To accomplish 
this, we decompose each coefficient s„ as the product of two hidden variables: 

= sn ■ tf> O P0tfW£>) = { 6{X l " ^ Sn = (4) 

^ S(x n '), s n = 0, 

where s n G {0, 1} is a binary variable that indicates support set membership, and n S C is a variable 
that provides the amplitude of coefficient Xn . When s n = 0, x$ = and n ^ S, and when s n = 1, 
%n = On an d n € S. To model the sparsity of the signal, we treat each s n as a Bernoulli random 
variable with Pr{s n = 1} = A n < 1. 

In order to model the temporal correlation of signal amplitudes, we treat the evolution of amplitudes 
over time as stationary first-order Gauss-Markov random processes. Specifically, we assume that 0$ 
evolves according to the following linear dynamical system model: 

9® = (1 - a){0t 1] - C) + aw$ + C, (5) 

where £ G C is the mean of the amplitude process, Wn ~ CM(0, p) is a circularly symmetric white 
Gaussian perturbation process, and a G [0, 1] is a scalar that controls the correlation of 0® across time. 
At one extreme, a = 0, the random process is perfectly correlated (0^ = On while at the other 
extreme, a = 1, the amplitudes evolve independently over time. Note that the binary support vector, s, is 
independent of the amplitude random process, {0®}f =1 , which implies that there are hidden amplitude 



2 If the signal being recovered is not itself sparse, it is assumed that there exists a known basis, incoherent with the measurement 
matrix, in which the signal possesses a sparse representation. Without loss of generality, we will assume the underlying signal 
is sparse in the canonical basis. 
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"trajectories", {9 n }f =1 , associated with inactive coefficients. Consequently, 0$ should be thought of as 
the conditional amplitude of Xn , conditioned on s n = 1. 

Under our model, the prior distribution of any signal coefficient, Xn , is a Bernoulli-Gaussian or 
"spike-and-slab" distribution: 

p(x®) = (1 - *nW xi n) + KCN(x$;(,a 2 ), (6) 

where <$(•) is the Dirac delta function and a 2 = is the steady-state variance of 9 { n\ We note that 
when A n < 1, © is an effective sparsity -promoting prior due to the point mass at x$ = 0. 

III. The Support-Aware Kalman Smoother 

Prior to describing AMP-MMV in detail, we first motivate the type of inference we wish to perform. 
Suppose for a moment that we are interested in obtaining a minimum mean square error (MMSE) 
estimate of {x®}f =1 , and that we have access to an oracle who can provide us with the support, S. With 
this knowledge, we can concentrate solely on estimating {6®}J =1 , since, conditioned on S, an MMSE 
estimate of {6®}f =1 can provide an MMSE estimate of {x®}f =1 . For the linear dynamical system of 
(O, the support-aware Kalman smoother (SKS) provides the appropriate oracle-aided MMSE estimator 
of {6®}f =1 ED. The state-space model used by the SKS is: 

0(*) = (1 - a)6^ + a(l N + aw®, (7) 
y (t) = AD(a)0® +e®, (8) 

where s is the binary support vector associated with S. If 0^ is the MMSE estimate returned by the 
SKS, then an MMSE estimate of sW is given by x® = T>(s)0^\ 

The state-space model <[7j), ([8]) provides a useful interpretation of our signal model. In the context of 
Kalman smoothing, the state vector 0® is only partially observable (due to the action of T>(s) in ([8])). 
Since T>(s)0® = x®, noisy linear measurements of x® are used to infer the state 9®. However, since 
only those On for which n € S are observable, and thus identifiable, they are the only ones whose 
posterior distributions will be meaningful. 

Since the SKS performs optimal MMSE estimation, given knowledge of the true signal support, it 
provides a useful lower bound on the achievable performance of any support-agnostic Bayesian algorithm 
that aims to perform MMSE estimation of {x^}f =1 . 
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Fig. 1: Factor graph representation of the decomposition of p(x, 6, s\y) in l[9j. 

IV. The AMP-MMV Algorithm 

In Section [TTJ we decomposed each signal coefficient, x$ , as the product of a binary support variable, 
s n , and an amplitude variable, 0$ . We now develop an algorithm that infers a marginal posterior 
distribution on each variable, enabling both soft estimation and soft support detection. 

The statistical structure of the signal model from Section |H] becomes apparent from a factorization of 
the posterior joint pdf of all random variables. Recalling from (0 the definitions of y and x, and defining 
similarly, the posterior joint distribution factors as follows: 

T / M N \ N 

P (x,e, s \y) oc J] II p(v$\ xit) ) J\p{4 ) \e^\s n ) P {e^\et 1) ) \{p{s n ), (9) 

t=l \m=l n=l / n=l 

where oc indicates equality up to a normalizing constant, and p{9n\0n^) — p(@rP)- A convenient 
graphical representation of this decomposition is given by a factor graph ll32l . which is an undirected 
bipartite graph that connects the pdf "factors" of © with the variables that make up their arguments. 
The factor graph for the decomposition of (© is shown in Fig. Q] The factor nodes are denoted by 
filled squares, while the variable nodes are denoted by circles. In the figure, the signal variable nodes at 
timestep t, {xn }^Li> are depicted as lying in a plane, or "frame", with successive frames stacked one 
after another. Since during inference the measurements {ym} are known observations and not random 
variables, they do not appear explicitly in the factor graph. The connection between the frames occurs 
through the amplitude and support indicator variables, providing a graphical representation of the temporal 
correlation in the signal. For visual clarity, these {#n }t=i an d s n variable nodes have been removed 
from the graph for the intermediate index n, but should in fact be present at every index n = 1, . . . , N. 
The factor nodes in Fig. [TJhave all been assigned alphabetic labels; the correspondence between these 



8 



Factor Distribution Functional Form 



ftn(sn) P{sn) (l-A„) (1 Sn) (A„) S ™ 

TABLE I: The factors, underlying distributions, and functional forms associated with the signal model of Section ITT1 

labels and the distributions they represent, as well as the functional form of each distribution, is presented 
in Table ffl 

A natural approach to performing statistical inference on a signal model that possesses a convenient 
factor graph representation is through a message passing algorithm known as belief propagation ||33"1 . In 
belief propagation, the messages exchanged between connected nodes of the graph represent probability 
distributions. In cycle-free graphs, belief propagation can be viewed as an instance of the sum-product 
algorithm l32ll . allowing one to obtain an exact posterior marginal distribution for each unobserved 
variable, given a collection of observed variables. When the factor graph contains cycles, the same rules 
that define the sum-product algorithm can still be applied, however convergence is no longer guaranteed 
|[32l . Despite this, there exist many problems to which loopy belief propagation [34!] has been successfully 
applied, including inference on Markov random fields ||35ll , LDPC decoding 11361 , and compressed sensing 

m, na, EH-na. 

We now proceed with a high-level description of AMP-MMV, an algorithm that follows the sum- 
product methodology while leveraging recent advances in message approximation f25|. In what follows, 
we use JV->&(") to denote a message that is passed from node a to a connected node b. 



A. Message Scheduling 

Since the factor graph of Fig. Q] contains many cycles there are a number of valid ways to schedule, or 
sequence, the messages that are exchanged in the graph. We will describe two message passing schedules 
that empirically provide good convergence behavior and straightforward implementation. We refer to these 
two schedules as the parallel message schedule and the serial message schedule. In both cases, messages 
are first initialized to agnostic values, and then iteratively exchanged throughout the graph according to 
the chosen schedule until either convergence occurs, or a maximum number of allowable iterations is 
reached. 

Conceptually, both message schedules can be decomposed into four distinct phases, differing only in 
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which messages are initialized and the order in which the phases are sequenced. We label each phase 
using the mnemonics (into), (within), (out), and (across). In phase (into), messages are passed from 
the s n and 0$ variable nodes into frame t. Loosely speaking, these messages convey current beliefs 
about the values of s and In phase (within), messages are exchanged within frame t, producing an 
estimate of using the current beliefs about s and 0® together with the available measurements 
In phase (out), the estimate of x® is used to refine the beliefs about s and 0® by passing messages 
out of frame t. Finally, in phase (across), messages are sent from n to either n t+l ^ or On , thus 
conveying information across time about temporal correlation in the signal amplitudes. 

The parallel message schedule begins by performing phase (into) in parallel for each frame t = 1, . . . , T 
simultaneously. Then, phase (within) is performed simultaneously for each frame, followed by phase 
(out). Next, information about the amplitudes is exchanged between the different timesteps by performing 
phase (across) in the forward direction, i.e., messages are passed from n ^ to n 2 \ and then from n 2 ^ 

(3) (T) 

to n , proceeding until n is reached. Finally, phase (across) is performed in the backward direction, 

(T) (1) 

where messages are passed consecutively from n down to n . At this point, a single iteration of 
AMP-MMV has been completed, and a new iteration can commence starting with phase (into). In this 
way, all of the available measurements, {y^}J = i, are used to influence the recovery of the signal at each 
timestep. 

The serial message schedule is similar to the parallel schedule except that it operates on frames in a 
sequential fashion, enabling causal processing of MMV signals. Beginning at the initial timestep, t = 1, 
the serial schedule first performs phase (into), followed by phases (within) and (out). Outgoing messages 
from the initial frame are then used in phase (across) to pass messages from 0^ to Q n 2 \ The messages 

(2) 

arriving at n , along with updated beliefs about the value of s, are used to initiate phase (into) at 
timestep t = 2. Phases (within) and (out) are performed for frame 2, followed by another round of 
phase (across), with messages being passed forward to n 3 \ This procedure continues until phase (out) 
is completed at frame T. Until now, only causal information has been used in producing estimates of 
the signal. If the application permits smoothing, then message passing continues in a similar fashion, but 

(T) (T— 1] 

with messages now propagating backward in time, i.e., messages are passed from n ' to n , phases 

(T—l) (T—2) 

(into), (within), and (out) are performed at frame T — l, and then messages move from n to n 
The process continues until messages arrive at n l \ at which point a single forward/backward pass has 
been completed. We complete multiple such passes, resulting in a smoothed estimate of the signal. 
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Fig. 2: A summary of the four message passing phases, including message notation and form. 

B. Implementing the Message Passes 

Space constraints prohibit us from providing a full derivation of all the messages that are exchanged 
through the factor graph of Fig. Q] Most messages can be derived by straightforward application of the 
rules of the sum-product algorithm. Therefore, in this sub-section we will restrict our attention to a 
handful of messages in the (within) and (out) phases whose implementation requires a departure from 
the sum-product rules for one reason or another. 

To aid our discussion, in Fig. |2] we summarize each of the four phases, focusing primarily on a 
single coefficient index n at some intermediate frame t. Arrows indicate the direction that messages are 
moving, and only those nodes and edges participating in a particular phase are shown in that phase. 
For the (across) phase we show messages being passed forward in time, and omit a graphic for the 
corresponding backwards pass. The figure also introduces the notation that we adopt for the different 
variables that serve to parameterize the messages. Certain variables, e.g., rf$ and tj$, are accented with 
directional arrows. This is to distinguish variables associated with messages moving in one direction 
from those associated with messages moving in another. For Bernoulli message pdfs, we show only the 
nonzero probability, e.g., A n = Vhn-+8 n { s n = !)• 

Phase (within) entails using the messages transmitted from s n and 0$ to fjp to compute the messages 
that pass between Xn and the {g$} nodes. Inspection of Fig. [2] reveals a dense interconnection between 
the {il''} and {gm} nodes. As a consequence, applying the standard sum-product rules to compute 
the v (t) .(*)(■) messages would result in an algorithm that required the evaluation of multi-dimensional 
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integrals that grew exponentially in number in both N and M. Since we are strongly motivated to apply 
AMP-MMV to high-dimensional problems, this approach is clearly infeasible. Instead, we turn to a 
recently developed algorithm known as approximate message passing (AMP). 

AMP was originally proposed by Donoho et al. E51 as a message passing algorithm designed to 
solve the noiseless SMV CS problem known as Basis Pursuit (min ||x||i s.t. y = Ax), and was subse- 
quently extended [263 to support MMSE estimation under white-Gaussian-noise-corrupted observations 
and generic signal priors of the form p(x) = []j)(x n ) through an approximation of the sum-product 
algorithm. In both cases, the associated factor graph looks identical to that of the (within) segment of 
Fig. [2] Conventional wisdom holds that loopy belief propagation only works well when the factor graph 
is locally tree-like. For general, non-sparse A matrices, the (within) graph will clearly not possess this 
property, due to the many short cycles between the Xn and g„} nodes. Reasoning differently, Donoho 
et al. showed that the density of connections could prove beneficial, if properly exploited. 

In particular, central limit theorem arguments suggest that the messages propagated from the g m nodes 
to the x n nodes under the sum-product algorithm can be well-approximated as Gaussian when the problem 
dimensionality is sufficiently high. Moreover, the computation of these Gaussian-approximated messages 
only requires knowledge of the mean and variance of the sum-product messages from the x n to the c/ m 
nodes. Finally, when |^4 mn | 2 scales as 0(l/M) for all (m, n), the differences between the variances of 
the messages emitted by the x n nodes vanish as M grows large, as do those of the g m nodes when 
N grows large, allowing each to be approximated by a single, common variance. Together, these sum- 
product approximations yield an iterative thresholding algorithm with a particular first-order correction 
term that ensures both Gaussianity and independence in the residual error vector over the iterations. 
The complexity of this iterative thresholding algorithm is dominated by a single multiplication by A 
and A H per iteration, implying a per-iteration computational cost of O(MN) flops. Furthermore, the 
state-evolution equation that governs the transient behavior of AMP shows that the number of required 
iterations does not scale with either M or N, implying that the total complexity is itself 0{MN) flops. 

AMP's suitability for the MMV problem stems from several considerations. First, AMP's probabilistic 
construction, coupled with its message passing implementation, makes it well-suited for incorporation as 
a subroutine within a larger message passing algorithm. In the MMV problem it is clear that p(x) ^ 
~\p{xn^) due to the joint sparsity and amplitude correlation structure, and therefore AMP does not appear 
to be directly applicable. Fortunately, by modeling this structure through the hidden variables s and 6, 
we can exploit the conditional independence of the signal coefficients: p(x\s,6) = Ylp(x$\s n} 9 n ). 
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In particular, we replace the p(x$) that AMP traditionally expects with um (*)(■)> the most recent 
message moving into the (within) segment of Fig. |2] This message represents a "local prior" on Xn 
given the current belief about the hidden variables s n and On, and assumes the Bernoulli-Gaussian form 



(*«) = (1 - *<P)5(x!p) + n®Ctf(x®-,$, (10) 



This "local prior" determines the AMP soft-thresholding functions defined in (ID 1 b - (ID4b of Table JI] 
The derivation of these thresholding functions closely follows those outlined in ll30l . which considered 
the special case of a zero-mean Bernoulli-Gaussian prior. 

Beyond the ease with which AMP is included into the larger message passing algorithm, a second 
factor that favors using AMP is the tremendous computational efficiency it imparts on high-dimensional 
problems. Using AMP to perform the most computationally intensive message passes enables AMP- 
MMV to attain a linear complexity scaling in all problem dimensions. To see why this is the case, note 
that the (into), (out), and (across) steps can be executed in 0{N) flops/timestep, while AMP allows 
the (within) step to be executed in 0(MN) flops/timestep (see (IA4b - (lA8b of Table JI]). Since these 
four steps are executed 0(T) times per AMP-MMV iteration for both the serial and parallel message 
schedules, it follows that AMP-MMV's overall complexity is 

A third appealing feature of AMP is that it is theoretically well-grounded; a recent analysis [40) shows 
that, for Gaussian A in the large-system limit (i.e., M, N — > oo with MIN fixed), the behavior of AMP 
is governed by a state evolution whose fixed points, when unique, correspond to MMSE-optimal signal 
estimates. 

After using AMP to implement phase (within), we must pass messages out of frame t in order to 
update our beliefs about the values of s and 0® in the (out) phase. Applying the sum-product algorithm 
rules to compute the message ^w_ ) .g(t)(-) results in the expression 

^\e^M ] ) = (1 - n^CM&^ct) + n^CMiO^-^ct), (11) 

which is an improper distribution due to the constant (w.r.t. 9$) term CN(0; <j) n t,Ct). This behavior is a 
consequence of the conditional signal model (0]). In particular, when s n = 0, i„ provides no information 

3 The primary computational burden of executing AMP-MMV involves performing matrix-vector products with A and A H , 
allowing it to be easily applied in problems where the measurement matrix is never stored explicitly, but rather is implemented 
implicitly through subroutines. Fast implicit A operators can provide significant computational savings in high-dimensional 
problems; implementing a Fourier transform as a fast Fourier transform (FFT) subroutine, for example, would drop AMP- 
MMV's complexity from <D(TMN) to 0(TN log 2 N). 
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% Define soft-thresholding functions 

F nt (0; c) 4 (1 + 7nt (£ c ))-i 

G nt ft; c) ^ (1 + 7n* (0; c))~ 1 (-§f^) + 7n< ft; c)|F„ (fc c) | 
<t(<M = ^«t(«,c) = iG Bt (*c) 

r ^ i*i 2 +& t} *^+S, f) ^* - c iS, f) i 2 



X cxp 



(Dl) 

(D2) 
(D3) 

(D4) 



% Begin passing messages . . . 

for t = 1, . . . ,T, Vn : 

% Execute the (into) phase . 



TV 



(i-A„)-n t / ? . t (i-^ t ' ) )+A„.n t / ?st ^ t ' ) 



7(t) _ 4?' 



Sn - T ^<t) J 

% Initialize AMP-related variables . . . 

Vm : z x mt = y$,Vn : «„ t = 0, and 4 = 100 ■ £Li «/4 4) 
% Execute the (within) phase using AMP . . . 



(Al) 
(A2) 
(A3) 



for i = 1. 



u i+1 

1 n! 



. , /, Vn, m : 

5Zm=l ^-mn z% mt T" Mni 

J_ i; <+1 
M ^n — 1 ^nt 



2 I+1 
-ml 



„(*) 



T i+1 



M 



end 

-(*) 



% Store current estimate of x 



(t) 



% Execute the (out) phase . . . 

= (i + (_^ T ) 7nt (^. c / + i ) )- 1 

Kn . vC ) = taylo^approx^i*' , <^ t , c t 7 ) 

% Execute the (across) phase from 0^ to 6»£* +1) . 



* =(1 - Q) (^^)(^ + {F) +aC 



(A4) 
(A5) 
(A6) 
(A7) 
(A8) 

(A9) 

(A10) 
(All) 

(A12) 
(A13) 



end 



TABLE II: Message update equations for executing a single forward pass using the serial message schedule. 

about the value of 9$. Roughly speaking, the term CJ\f(0; 4> n t,ct) corresponds to the distribution of 9$ 
conditioned on the case s n = 0. 

As a means of circumventing the improper message pdf above, we will regard our original signal 
model, in which s n € {0, 1}, as the limiting case of a signal model in which s n € {e, 1} with e — s> 0. 
For any fixed, positive e, v m^qw (■) is given by the proper pdf 



mod 



(9$) = (1 - n(?pW)) CM(6®; U nt , Xct) + 0(4*)) CN(9®;<t> nU c t ), 



(12) 
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where 



e 2 ir 



n(7r) ^ - — ^— — . (13) 

(1 — 7Tj + £ 2 VT 

Equation (fT2l ) is a binary Gaussian mixture density. When e « 1, the first Gaussian component is 
extremely broad, and conveys little information about the possible value of 9$. The second component 
is a more informative Gaussian whose mean, 4> nt , and variance, q, are determined by the product of the 
messages {z/ w^ j .b(-)}^_ r The relative mass assigned to each Gaussian component is a function of 
the incoming activity probability vr^ (see (fTOl)). Note that the limiting case of fi(-) is a simple indicator 
function: 



if < vr < 1, 

limQ(vr) = { (14) 

1 if vr = 1. 



When implementing AMP-MMV, we therefore fix e at a small positive value, e.g., e = 1 x 10~ 7 . If 
desired, (fT2l ) could then be used as the outgoing message, however this would present a further difficulty. 
Propagating a Gaussian mixture along a given edge would result in an exponential growth in the number 
of mixture components that would need to be propagated along subsequent edges. To avoid this outcome, 
we collapse our binary Gaussian mixture to a single Gaussian component, an approach sometimes referred 
to as Gaussian sum approximation BT1 . II421 . Since, for e < 1, fi(-) behaves nearly like the indicator 
function in (fT4l . one of the two Gaussian components will typically have negligible mass. For this reason, 
collapsing the mixture to a single Gaussian appears justifiable. 

To carry out the collapsing, we perform a second-order Taylor series approximation of — log ^™°?_^(t) {On}) 
with respect to 9$ about the point <^ n tO This provides the mean, tf n \ and variance, ij^J , of the single 
Gaussian that serves as UAn^w (•)• (See Fig. [2]) In Appendix lAl we summarize the Taylor approximation 
procedure, and in Table UlTI provide the pseudocode function, taylor_approx, for computing £^ and ■ 

With the exception of the messages discussed above, all the remaining messages can be derived using 
the standard sum-product algorithm rules j32l . For convenience, we summarize the results in Table HU 
where we provide a pseudocode implementation of a single forward pass of AMP-MMV using the serial 
message schedule. 

V. Estimating the Model Parameters 

The signal model of Section JI] depends on the sparsity parameters {A n },^ =1 , amplitude parameters £, 
a, and p, and noise variance a\. While some of these parameters may be known accurately from prior 



4 For technical reasons, the Taylor series approximation is performed in R 2 instead of C. 
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function (f, ip) = taylor_approx(7r, <j>, c) 



% Define useful variables: 



a = e 2 (l-Cl(n)) 
a = Q(tF) 



(Tl) 
(T2) 
(T3) 
(T4) 
(T5) 



^^\{i-\)4>\ 2 
B p A_2d(i_i)0,. 



% Compute outputs: 

"f_ (g 2 e~ t '+aa+ti 2 e")c 



(T6) 



e 2 2 e -6 +a5 ( £ 2 + 1 _i ci ,2) 



— 1 i — * — b , 




(T7) 



(T8) 



return (§, 



(T9) 



TABLE III: Pseudocode function for computing a single-Gaussian approximation of | |12I >. 



information, it is likely that many will require tuning. To this end, we develop an expectation-maximization 
(EM) algorithm that couples with the message passing procedure described in Section IIV-AI to provide 
a means of learning all of the model parameters while simultaneously estimating the signal x and its 
support s. 

The EM algorithm R3l is an appealing choice for performing parameter estimation for two primary 
reasons. First and foremost, the EM algorithm is a well-studied and principled means of parameter 
estimation. At every EM iteration, the data likelihood function is guaranteed to increase until convergence 
to a local maximum of the likelihood function occurs ll43l . For multimodal likelihood functions, local 
maxima will, in general, not coincide with the global maximum likelihood (ML) estimator, however a 
judicious initialization can help in ensuring the EM algorithm reaches the global maximum j44|. Second, 
the expectation step of the EM algorithm relies on quantities that have already been computed in the 
process of executing AMP-MMV. Ordinarily, this step constitutes the major computational burden of any 
EM algorithm, thus the fact that we can perform it essentially for free makes our EM procedure highly 
efficient. 

We let T = {A, £, a, p, a 2 } denote the set of all model parameters, and let T k denote the set of 
parameter estimates at the k th EM iteration. Here we have assumed that the binary support indicator 
variables share a common activity probability, A, i.e., Pr{s n = 1} = A Vn. For all parameters except a 2 e 
we use s and as the so-called "missing" data of the EM algorithm, while for a 2 e we use x. 

For the first iteration of AMP-MMV, the model parameters are initialized based on either prior signal 
knowledge, or according to some heuristic criteria. Using these parameter values, AMP-MMV performs 
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either a single iteration of the parallel message schedule, or a single forward/backward pass of the 
serial message schedule, as described in Section IIV-AI Upon completing this first iteration, approximate 
marginal posterior distributions are available for each of the underlying random variables, e.g., p(xn'\y), 
p( s n\y), and p(6n\y)- Additionally, belief propagation can provide pairwise joint posterior distributions, 
e.g., P (e%\et 1} \y), for any variable nodes connected by a common factor node 05 ij. With these 
marginal, and pairwise joint, posterior distributions, it is possible to perform the iterative expectation 
and maximization steps required to maximize p(y\T) in closed-form. We adopt a Gauss-Seidel scheme, 
performing coordinate-wise maximization, e.g., 

A fc+1 = argmaxE Si% [logp(y, s, 0- A, T fc \{A^) 



y,r k 



where k is the iteration index common to both AMP-MMV and the EM algorithm. 

In Table UVl we provide the EM parameter update equations for our signal model. In practice, we found 
that the robustness and convergence behavior of our EM procedure were improved if we were selective 
about which parameters we updated on a given iteration. For example, the parameters a and p are tightly 
coupled to one another, since var{#4 |#n ^} = a 2 p. Consequently, if the initial choices of a and p are 
too small, it is possible that the EM procedure will overcompensate on the first iteration by producing 
revised estimates of both parameters that are too large. This leads to an oscillatory behavior in the EM 
updates that can be effectively combated by avoiding updating both a and p on the same iteration. 

VI. Numerical Study 

In this section we describe the results of an extensive numerical study that was conducted to explore 
the performance characteristics and tradeoffs of AMP-MMV. MATLAB codqj was written to implement 
both the parallel and serial message schedules of Section |TV- A 1 along with the EM parameter estimation 
procedure of Section |V] 

For comparison to AMP-MMV, we tested two other Bayesian algorithms for the MMV problem, 
MSBL |fl"5l and T-MSBLlj |[T8l , which have been shown to offer "best in class" performance on the 
MMV problem. We also included a recently proposed greedy algorithm designed specifically for highly 
correlated signals, subspace-augmented MUSIQj (SA-MUSIC), which has been shown to outperform 
MMV basis pursuit and several correlation-agnostic greedy methods Ifl4l . Finally, we implemented the 

5 Code available at ece.osu.edu/~schniter/turboAMPmmv 

6 Code available at dsp.ucsd.edu/~zhilin/Software.html 

7 Code obtained through personal correspondence with authors. 
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% Define key quantities obtained from AMP-MMV at iteration k: 

E[S ™ |5] = I tit ^InhfuT n =tt (QD 

A n n t= i *■„ j +(i-a„) n t= i(i-T„ ) 

i#>A var{ 0M| 5} = Q^ + ^ J . + ^ 3 (Q2) 

^^EttfWlS] = 4«.(|^ + ^ + ^) (Q3) 

4*' = var{a;l' ) |y} % See Eg) of TableED 
^ = E [x$ | y] % See ES) of TableED 



% EM update equations: 

A fc+1 = £E£=iE[*»lv] (El) 

Afc+i_? £g-i) « r 1 / i y^iv -(i) 

+ EL 2 E£U ^(^» - (1 (E2) 
Qfe+1 = 4iV(T-i) ( fa ^ Vfa 2 + 8JV(T-l)c) (E3) 
where: 

f ^ ^ EL 2 ELi ^{E^)*^- 1 ' | y ]} 

C = ^ Ef =2 E^U W + lAn'l 2 + + lAn _1) | 2 

-2«c{E[el' ) *ei t - 1) | S ]} 

a fc+i - , „ 1 T T Y^ N v {t) + \u (t) \ 2 

+(Q fc)2| ? fe|2 _ 2(1 _ ^^{e^X* -1 ^]} 

-2a fc Kc{/iL i) *C fc } + 2o fc (l - a fc )mc{Ak t_1) *C fe } 
+(l-a fe )(ti t - 1) + |Ak t ~' 1) | 2 ) (E4) 
j fc+1 = TM (ELi llw (t) ~ ^ (t) ll 2 + 1> W ) (E5) 



TABLE IV: EM algorithm update equations for the signal model parameters of Section HI1 

support-aware Kalman smoother (SKS), which, as noted in Section [Till provides a lower bound on the 
achievable MSE of any algorithm. To implement the SKS, we took advantage of the fact that y, x, and 6 
are jointly Gaussian when conditioned on the support, s, and thus Fig. Q] becomes a Gaussian graphical 
model. Consequently, the sum-product algorithm yields closed-form expressions (i.e., no approximations 
are required) for each of the messages traversing the graph. Therefore, it is possible to obtain the desired 
posterior means (i.e., MMSE estimates of x) despite the fact that the graph is loopy B6l Claim 5]. 

In all of our experiments, performance was analyzed on synthetically generated datasets, and averaged 
over 250 independent trials. Since MSBL and T-MSBL were derived for real-valued signals, we used 
a real-valued equivalent of the signal model described in Section UTJ and ran a real-valued version of 
AMP-MMV. Our data generation procedure closely mirrors the one used to characterize T-MSBL in |fl~8l . 
Unless otherwise stated, the measurement matrices were i.i.d. Gaussian random matrices with unit-norm 
columns, T = 4 measurement vectors were generated, the stationary variance of the amplitude process 
was set at a 2 = = 1, and the noise variance a 2 was set to yield an SNR of 25 dB. 
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Three performance metrics were considered throughout our tests. The first metric, which we refer to 
as the time-averaged normalized MSE (TNMSE), is defined as 



support, is the normalized support error rate (NSER), which is defined as the number of indices in which 
the ttue and estimated support differ, normalized by the cardinality of the true support S. The third and 
final metric is runtime, which is an important metric given the prevalence of high-dimensional datasets. 

The algorithms were configured and executed as follows: to obtain support estimates for MSBL, 
T-MSBL, and AMP-MMV, we adopted the technique utilized in [QjO of identifying the K amplitude 
trajectories with the largest £2 norms as the support set, where K = |<S|. Note that this is an optimistic 
means of identifying the support, as it assumes that an oracle provides the true value of K. For this 
reason, we implemented an additional «o«-oracle-aided support estimate for AMP-MMV that consisted 
of those indices n for which p(s n \y) > \. In all simulations, AMP-MMV was given imperfect knowledge 
of the signal model parameters, and refined the initial parameter choices according to the EM update 
procedure given in Table [TV] In particular, the noise variance was initialized at a\ = 1 X 10~ 3 . The 
remaining parameters were initialized agnostically using simple heuristics that made use of sample 
statistics derived from the available measurements, y. Equation (|A9b of Table HT1 was used to produce 
which corresponds to an MMSE estimate of x® under AMP-MMV's estimated posteriors p(xn \y). In 
the course of running simulations, we monitored the residual energy, 

Ef=i \\y {t) - A x {t) \\i and would 

automatically switch the schedule, e.g., from parallel to serial, and/or change the maximum number of 
iterations whenever the residual energy exceeded a noise variance-dependent threshold. The SKS was 
given perfect parameter and support knowledge and was run until convergence. Both MSBL and T-MSBL 
were tuned in a manner recommended by the codes' authors. SA-MUSIC was given the true value of K, 
and upon generating an estimate of the support, <S, a conditional MMSE signal estimate was produced, 
e.g., = E[x®\S,y®]. 

A. Performance Versus Sparsity, M/K 

As a first experiment, we studied how performance changes as a function of the measurements-to- 
active-coefficients ratio, M/K. For this experiment, N = 5000, M = 1563, and T = 4. The activity 
probability, A, was swept over the range [0.096,0.22], implying that the ratio of measurements-to-active - 
coefficients, M/K, ranged from 1.42 to 3.26. 




where x^' is an estimate of x^\ The second metric, intended to gauge the accuracy of the recovered 
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a = 0.1 | N = 5000, M = 1 563, T = 4, SNR = 25 dB 



= 0.1 | N = 5000, M = 1 563, T = 4, SNR = 25 dB 
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Fig. 3: A plot of the TNMSE (in dB), NSER, and runtime of T-MSBL, MSBL, SA-MUSIC, AMP-MMV, and the SKS versus 
M/K. Correlation coefficient 1 — a = 0.90. 



In Fig. |3l we plot the performance when the temporal correlation of the amplitudes is 1 — a = 0.90. 
For AMP-MMV, two traces appear on the NSER plot, with the O marker corresponding to the K-largest- 
trajectory-norm method of support estimation, and the A marker corresponding to the support estimate 
obtained from the posteriors p{s n \y). We see that, when M/K > 2, the TNMSE performance of both 
AMP-MMV and T-MSBL is almost identical to that of the oracle-aided SKS. However, when M/K < 2, 
every algorithm's support estimation performance (NSER) degrades, and the TNMSE consequently grows. 
Indeed, when M/K < 1.50, all of the algorithms perform poorly compared to the SKS, although T-MSBL 
performs the best of the four. We also note the superior NSER performance of AMP-MMV over much of 
the range, even when using p(s n \y) to estimate S (and thus not requiring apriori knowledge of K). From 
the runtime plot we see the tremendous efficiency of AMP-MMV. Over the region in which AMP-MMV is 
performing well (and thus not cycling through multiple configurations in vain), we see that AMP-MMV's 
runtime is more than one order-of-magnitude faster than SA-MUSIC, and two orders-of-magnitude faster 
than either T-MSBL or MSBL. 

In Fig. |4] we repeat the same experiment, but with increased amplitude correlation 1 — a = 0.99. In 
this case we see that AMP-MMV and T-MSBL still offer a TNMSE performance that is comparable to 
the SKS when M/K > 2.50, whereas the performance of both MSBL and SA-MUSIC has degraded 
across-the-board. When M/K < 2.5, the NSER and TNMSE performance of AMP-MMV and T-MSBL 
decay sharply, and all the methods considered perform poorly compared to the SKS. Our finding that 
performance is adversely affected by increased temporal correlation is consistent with the theoretical and 
empirical findings of JSJ, lfT4l . |fT51 , Ifflfl . Interestingly, the performance of the SKS shows a modest 
improvement compared to Fig. |3l reflecting the fact that the slower temporal variations of the amplitudes 
are easier to track when the support is known. 
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a = 0.01 | N = 5000, M = 1563, T = 4, SNR = 25 dB 8 « 0.01 | N = 5000, M = 1 563, T = 4, SNR = 25 dB a = 0.01 | N = 5000, M = 1 563, T = 4, SNR = 25 dB 
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Fig. 4: A plot of the TNMSE (in dB), NSER, and runtime of T-MSBL, MSBL, SA-MUSIC, AMP-MMV, and the SKS versus 
M /K. Correlation coefficient 1 - a = 0.99. 



a = 0.1 N = 5000, M = 1000, k = 0.10, SNR = 25 dB cc = 0.1 N = 5000, M == 1000, X = 0.10, SNR = 25 dB a = 0.1 | N = 5000, M = 1000, ?.= 0.10, SNR - 25 dB 




# of MMVs # of MMVs # of MMVs 

Fig. 5: A plot of the TNMSE (in dB), NSER, and runtime of T-MSBL, MSBL, SA-MUSIC, AMP-MMV, and the SKS versus 
T. Correlation coefficient 1 - a — 0.90. 



B. Performance Versus T 

In a second experiment, we studied how performance is affected by the number of measurement 
vectors, T, used in the reconstruction. For this experiment, we used N = 5000, M = N/5, and A = 0.10 
(M/K = 2). Figure [5] shows the performance with a correlation of 1 — a = 0.90. Comparing to Fig. |3l 
we see that MSBL's performance is strongly impacted by the reduced value of M. AMP-MMV and 
T-MSBL perform more-or-less equivalently across the range of T, although AMP-MMV does so with 
an order-of-magnitude reduction in complexity. It is interesting to observe that, in this problem regime, 
the SKS TNMSE bound is insensitive to the number of measurement vectors acquired. 
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a = 0.05 I N = 5000, M - 1000, T = 4A= 0.0667 



a - 0.05 | N = 5000. M = 1000, J = 4,1- 0.0667 
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Fig. 6: A plot of the TNMSE (in dB), NSER, and runtime of T-MSBL, MSBL, SA-MUSIC, AMP-MMV, and the SKS versus 
SNR. Correlation coefficient 1 — a = 0.95. 



C. Performance Versus SNR 

To understand how AMP-MMV performs in low SNR environments, we conducted a test in which 
SNR was swept from 5 dB to 25 dB|3 The problem dimensions were fixed at N = 5000, M = N/5, and 
T = 4. The sparsity rate, A, was chosen to yield M/K = 3 measurements-per-active-coefficient, and the 
correlation was set at 1 — a = 0.95. 

Our findings are presented in Fig. Both T-MSBL and MSBL operate within 5 - 10 dB of the SKS 
in TNMSE across the range of SNRs, while AMP-MMV operates ps 5 dB from the SKS when the SNR 
is at or below 10 dB, and approaches the SKS in performance as the SNR elevates. We also note that 
using AMP-MMV's posteriors on s n to estimate the support does not appear to perform much worse 
than the K-largest-trajectory-norm method for high SNRs, and shows a slight advantage at low SNRs. 
The increase in runtime exhibited by AMP-MMV in this experiment is a consequence of our decision 
to configure AMP-MMV identically for all experiments; our initialization of the noise variance, cig, was 
more than an order-of-magnitude off over the majority of the SNR range, and thus AMP-MMV cycled 
through many different schedules in an effort to obtain an (unrealistic) residual energy. Runtime could 
be drastically improved in this experiment by using a more appropriate initialization of a\. 



D. Performance Versus Undersampling Rate, N/M 

As mentioned in Section H one of the principal aims of CS is to reduce the number of measurements 
that must be acquired while still obtaining a good solution. In the MMV problem, dramatic reductions 

8 In lower SNR regimes, learning rules for the noise variance are known to become less reliable 1151 . 1181 . Still, for high- 
dimensional problems, a sub-optimal learning rule may be preferable to a computationally costly cross-validation procedure. For 
this reason, we ran all three Bayesian algorithms with a learning rule for the noise variance enabled. 
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a = 0.25 I N == 5000, T = 4, M/K = 3 SNR - 25 dB a = 0.25 | N = 5000, T = 4, M/K - 3 SNR - 25 dB a= 0.25 | N = 5000, T = 4, M/K = 3 SNR = 25 dB 
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Fig. 7: A plot of the TNMSE (in dB), NSER, and runtime of T-MSBL, MSBL, SA-MUSIC, AMP-MMV, and the SKS versus 
undersampling rate, N/M. Correlation coefficient 1 — a = 0.75. 

in the sampling rate are possible. To illustrate this, in Fig. |7] we present the results of an experiment in 
which the undersampling factor, N/M, was varied from 5 to 25 unknowns-per-measurement. Specifically, 
N was fixed at 5000, while M was varied. A was likewise adjusted in order to keep M/K fixed 
at 3 measurements-per-active-coefftcient. In Fig. |7J we see that MSBL quickly departs from the SKS 
performance bound, whereas AMP-MMV, T-MSBL, and SA-MUSIC are able to remain close to the bound 
when N/M < 20. At N/M = 25, both AMP-MMV and SA-MUSIC have diverged from the bound, 
and, while still offering an impressive TNMSE, they are outperformed by T-MSBL. In conducting this 
test, we observed that AMP-MMV's performance is strongly tied to the number of smoothing iterations 
performed. Whereas for other tests, 5 smoothing iterations were often sufficient, in scenarios with a high 
degree of undersampling, (e.g., N/M > 15), 50— 100 smoothing iterations were often required to obtain 
good signal estimates. This suggests that messages must be exchanged between neighboring timesteps 
over many iterations in order to arrive at consensus in severely underdetermined problems. 

E. Performance Versus Signal Dimension, N 

As we have indicated throughout this paper, a key consideration of our method was ensuring that it 
would be suitable for high-dimensional problems. Our complexity analysis indicated that a single iteration 
of AMP-MMV could be completed in 0{TNM) flops. This linear scaling of the complexity with respect 
to problem dimensions gives encouragement that our algorithm should efficiently handle large problems, 
but if the number of iterations required to obtain a solution grows too rapidly with problem size, our 
technique would be of limited practical utility. To ensure that this was not the case, we performed an 
experiment in which the signal dimension, N, was swept logarithmically over the range [100, 10000]. M 
was scaled proportionally such that N/M = 3. The sparsity rate was fixed at A = 0.15 so that M/K 2, 
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Fig. 8: A plot of the TNMSE (in dB), NSER, and runtime of T-MSBL, MSBL, SA-MUSIC, AMP-MMV, and the SKS versus 
signal dimension, N. Correlation coefficient 1 — a = 0.95. 



and the correlation was set at 1 — a = 0.95. 

The results of this experiment are provided in Fig. [8] Several features of these plots are of interest. 
First, we observe that the performance of every algorithm improves noticeably as problem dimensions 
grow from N = 100 to N = 1000, with AMP-MMV and T-MSBL converging in TNMSE performance to 
the SKS bound. The second observation that we point out is that AMP-MMV works extremely quickly. 
Indeed, a problem with NT = 40000 unknowns can be solved accurately in just under 30 seconds. 
Finally, we note that at small problem dimensions, AMP-MMV is not as quick as either MSBL or SA- 
MUSIC, however AMP-MMV scales with increasing problem dimensions more favorably than the other 
methods; at N = 10000 we note that AMP-MMV runs at least two orders-of-magnitude faster than the 
other techniques. 



F. Performance With Time-Varying Measurement Matrices 

In all of the previous experiments, we considered the standard MMV problem CQ), in which all of the 
measurement vectors were acquired using a single, common measurement matrix. While this setup is 
appropriate for many tasks, there are a number of practical applications in which a joint-sparse signal is 
measured through distinct measurement matrices. 

To better understand what, if any, gains can be obtained from diversity in the measurement matrices, 
we designed an experiment that explored how performance is affected by the rate-of-change of the 
measurement matrix over time. For simplicity, we considered a first-order Gauss-Markov random process 
to describe how a given measurement matrix changed over time. Specifically, we started with a matrix 
whose columns were drawn i.i.d. Gaussian as in previous experiements, which was then used as the 
measurement matrix to collect the measurements at timestep t = 1. At subsequent timesteps, the matrix 
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Fig. 9: A plot of the TNMSE (in dB), NSER, and runtime of AMP-MMV and the SKS versus rate-of-change of the measurement 
matrix, (3. Correlation coefficient 1 — a = 0.99. 

evolved according to 

A {t) = (1 -/3)A ( '" 1) +/3U {t \ (15) 

where was a matrix whose elements were drawn i.i.d. Gaussian, with a variance chosen such that 
the column norm of A^' would (in expectation) equal one. 

In the test, /3 was swept over a range, providing a quantitative measure of the rate-of-change of the 
measurement matrix over time. Clearly, (3 = would correspond to the standard MMV problem, while 
/3 = 1 would represent a collection of statistically independent measurement matrices. 

In Fig. [9] we show the performance when N = 5000, N/M = 30, M/K = 2, and the correlation is 
1 — a = 0.99. For the standard MMV problem, this configuration is effectively impossible. Indeed, for 
/3 < 0.03, we see that AMP-MMV is entirely failing at recovering the signal. However, once /3 ?» 0.08, 
we see that the NSER has dropped dramatically, as has the TNMSE. Once (3 > 0.10, AMP-MMV is 
performing almost to the level of the noise. As this experiment should hopefully convince the reader, even 
modest amounts of diversity in the measurement process can enable accurate reconstruction in operating 
environments that are otherwise impossible. 

VII. Conclusion 

In this work we introduced AMP-MMV, a Bayesian message passing algorithm for solving the MMV 
problem (Q]) when temporal correlation is present in the amplitudes of the non-zero signal coefficients. 
Our algorithm, which leverages Donoho, Maleki, and Montanari's AMP framework ||25l . performs rapid 
inference on high-dimensional MMV datasets. In order to establish a reference point for the quality of 
solutions obtained by AMP-MMV, we described and implemented the oracle-aided support-aware Kalman 
smoother (SKS). In numerical experiments, we found a range of problems over which AMP-MMV 
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performed nearly as well as the SKS, despite the fact that AMP-MMV was given crude hyperparameter 
initializations that were refined from the data using an expectation-maximization algorithm. In comparing 
against two alternative Bayesian techniques, and one greedy technique, we found that AMP-MMV 
offers an unrivaled performance-complexity tradeoff, particular in high-dimensional settings. We also 
demonstrated that substantial gains can be obtained in the MMV problem by incorporating diversity 
into the measurement process. Such diversity is particularly important in settings where the temporal 
correlation between coefficient amplitudes is substantial. 



Appendix A 
Taylor Series Approximation of v m ,°? h 



In this appendix we summarize the procedure used to collapse the binary Gaussian mixture of (fL2l ). 

isfZf (f) (6$), to a single Gaussian, „m (0$) = CN{9$; Cn ■> Vv^)- F° r simplicity, we drop the n 

fn — tun J 11 n 

and (i) sub- and superscripts. 

Let 6 r = 9\e{9}, let 9{ = 3m{#}, and let fa and fa be defined similarly. Define 

g(0 r ,0i) ± uf^Or+jOi), 

= (1 - n(5f)) CN(O r + jOf, I fa ic) + «(7t) CM{9 t + jOf, fa c) 

f(9 r ,0i) = - logg(0 r ,0i). 

Our objective is to approximate f(9 r ,0i) using a two-dimensional second-order Taylor series expansion, 
f(9 r ,9i), about the point fa 



f(fa,fa) + (9 r -fa.) 



df_ 



+ 



i) 



d9i 



+ - 



d 2 f 



391 



+ (9 r - fam - fa) 



d 2 f 



dOrdOi 



+ 



d 2 f 



89f 



It can be shown that, for Taylor series expansions about the point <j), 9 ® L = 0(e 2 ) and 



o 2 f a 2 / 



0{e 2 ). Since e <C 1, it is reasonable to therefore adopt a further approximation and assume 



d 2 f 



and !U, 



d 2 f _ d 2 f 



With this approximation, note that 

exp(-/(0 r , 6i)) oc CN(9 r + 
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with 



r - 9 9 f 



b r ij) X TT~ 



(16) 



(17) 



The pseudocode function, taylor_approx, that computes ( fT6l ), ( TTVT ) given the parameters of vj^ (-) is 
provided in Table UlTl 
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